Bilingual Embeddings and Word Alignments for Translation Quality Estimation

نویسندگان

  • Amal Abdelsalam
  • Ondrej Bojar
  • Samhaa El-Beltagy
چکیده

This paper describes our submission UFAL MULTIVEC to the WMT16 Quality Estimation Shared Task, for EnglishGerman sentence-level post-editing effort prediction and ranking. Our approach exploits the power of bilingual distributed representations, word alignments and also manual post-edits to boost the performance of the baseline QuEst++ set of features. Our model outperforms the baseline, as well as the winning system in WMT15, Referential Translation Machines (RTM), in both scoring and ranking sub-tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual Word Embeddings for Phrase-Based Machine Translation

We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic simil...

متن کامل

Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling

Estimating similarities at different levels of linguistic units, such as words, sub-phrases and phrases, is helpful for measuring semantic similarity of an entire bilingual phrase. In this paper, we propose a convolution-enhanced bilingual recursive neural network (ConvBRNN), which not only exploits word alignments to guide the generation of phrase structures but also integrates multiple-level ...

متن کامل

Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification

In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in lowresource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the ot...

متن کامل

Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation

We propose a simple log-bilinear softmaxbased model to deal with vocabulary expansion in machine translation. Our model uses word embeddings trained on significantly large unlabelled monolingual corpora and learns over a fairly small, wordto-word bilingual dictionary. Given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language...

متن کامل

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

We introduce BilBOWA (Bilingual Bag-ofWords without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-alig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016